Beyond Batch Processing: Towards Real-Time and Streaming Big Data
نویسنده
چکیده
Today, big data is generated from many sources and there is a huge demand for storing, managing, processing, and querying on big data. The MapReduce model and its counterpart open source implementation Hadoop, has proven itself as the de facto solution to big data processing. Hadoop is inherently designed for batch and high throughput processing jobs. Although Hadoop is very suitable for batch jobs but there is an increasing demand for non-batch processes on big data like: interactive jobs, real-time queries, and big data streams. Since Hadoop is not proper for these non-batch workloads, new solutions are proposed to these new challenges. In this article, we discuss two categories of these solutions: real-time processing, and stream processing for big data. For each category, we discuss paradigms, strengths and differences to Hadoop. We also introduce some practical systems and frameworks for each category. Finally, some simple experiments are done to show effectiveness of some solutions compared to available Hadoop-based solutions.
منابع مشابه
Design and Test of the Real-time Text mining dashboard for Twitter
One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...
متن کاملChapter 1 . Key Technologies for Big Data Stream Computing
1.1 Introduction Big data computing is a new trend for future computing with the quantity of data growing and the speed of data increasing. In general, there are two main mechanisms for big data computing, i.e., big data stream computing and big data batch computing. Big data stream computing is a model of straight through computing, such as Storm [1] and S4 [2] which do for stream computing wh...
متن کاملBIDCEP: A Vision of Big Data Complex Event Processing for Near Real Time Data Streaming
This position paper aims to trigger a technical discussion by proposing a conceptual architecture for big data streaming integrated with complex event processing (BiDCEP). BiDCEP expands the Lambda and Kappa (LK) architectures for big data streaming to fit the complex event processing (CEP) and event management domains of enterprise IT. BiDCEP links CEP components as defined in previous work of...
متن کاملOptimizing Query Processing in Batch Streaming System
With the growing need of processing “big data” in real time, modern streaming processing systems should be able to operate at the cloud scale. This imposes challenges to building large scale stream processing systems. First, processing tasks should be efficiently distributed to worker nodes with small overhead. Second, streaming data processing should be highly available, despite that failures ...
متن کاملFPGA-based hardware acceleration for Real-Time Big Data systems
This paper discusses how FPGA acceleration is used within the JUNIPER platform. JUNIPER is a processing platform to enable the development of real-time, Big Data systems. Unlike existing Big Data approaches which are based on either batch processing, or streaming processing that is “fast enough”, the JUNIPER platform integrates a range of technologies that increase the predictability of the sys...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computers
دوره 3 شماره
صفحات -
تاریخ انتشار 2014